Group 11: Analysis of Prostate Cancer Data

Introduction

Our analysis exploits data from a randomised clinical trial by Byar & Greene that compares treatment of patients with prostate cancer in stages 3 and 4. Treatment consisted of different doses of diethylstilbestrol (DES). Our aim is to investigate correlations, trends, or predictive models related to prostate cancer and patient outcomes. Data are publicly available in : https://hbiostat.org/data/repo/prostate.xls

Description

The initial dataset contains information related to 502 observations of patients with prostate cancer across 18 variables. These variables encompass diverse information including patient demographics, medical history, treatment received, and health status.

The raw data were: + loaded, + cleaned
+ augmented + described + modelled. and the process of arriving at results is done in a reproducible manner.

Tidy data

  • For instance we separate “rx” into three columns; “Treatment regime”, “mg” and “Drug”

Principal Component Analysis (PCA)

There were 3 main steps to this PCA, outlined below: Looking at the data in PC coordinates. Looking at the rotation matrix. Looking at the variance explained by each PC.

Here, the data in PC coordinates is plotted